A simple theory of protein folding kinetics 
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We present a simple model of protein folding dynamics that captures key qualitative elements 
recently seen in all-atom simulations. The goals of this theory are to serve as a simple formalism for 
gaining deeper insight into the physical properties seen in detailed simulations as well as to serve 
as a model to easily compare why these simulations suggest a different kinetic mechanism than 
previous simple models. Specifically, we find that non-native contacts play a key role in determining 
the mechanism, which can shift dramatically as the energetic strength of non-native interactions is 
changed. For protein-like non-native interactions, our model finds that the native state is a kinetic 
hub, connecting the strength of relevant interactions directly to the nature of folding kinetics. 

PACS numbers: 



Introduction. — Protein folding has been an important 
problem at the crossroads of statistical mechanics, com- 
puter simulation, and biophysics. There has been a long 
history of theoretical advances in the study ofprotein 
folding, and we refer the reader to reviews [lH4j|. Re- 
cent advances in computer simulations have enabled one 
to use detailed, atomistic models to simulate the com- 
plete process of folding, on relatively long (millisecond) 
timescales [5(. This has become possible due to the ad- 
vent of Markov State Models (MSMs) (see Refs. 0, % 
for recent reviews), an approach which uses detailed sim- 
ulation to construct a Master equation for the statistical 
dynamics of a particular protein. 

By examining and comparing MSMs for different pro- 
teins (as well as by direct examination of simulations), 
some surprises have emerged. Perhaps most importantly, 
the role of non-native contacts has now been highlighted 
as a key part ofprotein folding [5, 7]; in hindsight, this is 
natural, since amino acid interactions are not particularly 
specific and there often is little free energetic difference 
between say a native-like interaction between two aro- 
matic residues vs a non-native like interaction |5j. This 
opens a new door to re-examine simple models of pro- 
tein folding and to develop a new theoretical formalism 
to more naturally include non-native interactions. 

In this work, we take a Master equation approach 
to dynamics, much like one does computationally with 
MSMs. The key question is how to model the rate ma- 
trix. Previous models for protein folding kinetics [3, |9( 
(derived from models of spin glass dynamics |ld| ) have 
made very simple approximations for the nature of the 
rate matrix. Here, we develop a new theoretical frame- 
work for the rate matrix which allows for a more detailed 
model of kinetics, especially the natural inclusion of non- 
native interactions. The application of this model to var- 
ious regimes of protein-sequences allows one to make a 
direct connection to recent simulations [5| and, via a sim- 
ple, solvable analytic model, describe the essence of na- 
tive hubs (i.e. transitions between non-native states occur 
via the native state) recently seen in simulation [7j. 



Model. — We first introduce a phenomenological Hamil- 
tonian for the energy of structure a: 
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where Cg is the contact map of structure (microstate) 
a (i.e. either 1 if a contact between residues i and j is 
present in the structure a or otherwise) and CH is the 
contact map of the native state. We choose the native 
contacts and non-native contacts to have differing ener- 
getic contributions (e^v and €nn, respectively), noting 
that these quantities could naturally be negative (espe- 
cially ejy). Note that we are including terms such as 
solvent entropy in the "energy" term above. 



This Hamiltonian reduces to ri c 
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ejv — £nn is the extra energetic preference for native over 
non-native contacts and q a ^ = Y*. ■ C" C, ■ is the number 
of contacts in common between structures a and f3 (also 
note that thus q a,a is shorthand for the total number of 
contacts in structure a). This Hamiltonian is simple, yet 
captures the main element of interest in this work: the 
interplay between native and non-native interactions. 

However, in order to make a connection to more de- 
tailed calculations, it is useful to note that a simi- 
lar expression can be derived from more direct physi- 
cal grounds, i.e. from a microscopic Hamiltonian with 
the explicit concept of sequence design [2j. Starting 
from the more general, microscopic contact Hamiltonian 
H a = J2ij CijBsi,8j , where Bjj is a general matrix of 
monomer-monomer interactions (and we use capital let- 
ters to designate the space of different types of residues). 
This is analogous to problems that have already been 
solved [2j, yielding (to lowest order in 1/Td) the effective 
Hamiltonian for a is 
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where Td is the design temperature (lower T d means bet- 
ter optimized sequences), and we see terms involving 



the mean (B) and the variance (B%) of the Bjj ma- 
trix take the roles of of ejvw and e Xl respectively, in 
our phenomenological Hamiltonian. Having both rep- 
resentations (i.e. the phenomenological as well as the 
microscopic Hamiltonian) allows us also to make a natu- 
ral connection to previous work [2j . We will return to the 
sequence-based Hamiltonian results at the end, in order 
to make a more direct connection to protein biophysics. 

Kinetics formalism. — To build a kinetic model, we con- 
sider the master equation: 
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which means that we must consider the nature of the 
rate matrix k a p . To more easily see the impact of these 
elemental rates on the overall dynamics, we propose a 
simple model for the rate matrix: a block-diagonal ma- 
trix with a block diagonal form of n blocks each of size 
to rows but now with one additional row for the folded 
(native) state. Specifically, with elements of the form 
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within a nonnative block 
between nonnative blocks 
from non— native to native 
from native to non — native 
on diagonal 



(4) 

Here, the n blocks are meant to represent n mestastablc 
states, each consisting of to highly related conformations. 
Note that this matrix obeys detailed balance, although 
we have made the simplifying approximation that the free 
energy of states within a block are similar and the free 
energy between non-native blocks is also similar. 

This rate matrix has a well defined set of degener- 
ate eigenvalues: a non-degenerate eigenvalue of (the 
equilibrium eigenvalue), a (n — l)-fold degenerate eigen- 
value of kq = nmko + /cjvo (for transitions between non- 
native blocks), a [n(m — l)]-fold degenerate eigenvalue of 
m(n — l)fco + mk\ + fc/vo) (for transitions within a block), 
and a non-degerate eigenvalue of kn = nmkoN + ^tvo (for 
transitions to the native state). We note in passing that 
the degeneracy in the eigenvalues seen here is naturally 
broken by some small variations in rates between states 
(i.e. the rates between all non-native states would not be 
all exactly k$). Also, variations in the value of to between 
blocks do not change the results discussed below. 

In the large n limit, the ratio of rates of transforma- 
tions between non-native blocks vs those from non-native 
to native will be kq/kn — (nmko + kjsio) / {nmkoN + 
fcjvo) ~ ko/koN- Thus, our primary goal will be to com- 
pare these elemental rates. To do so, we take a Kramer's 
approximation for dynamics between a and j3, i.e. 



where k is the microscopic rate of interconversion, F a is 
the free energy of a, and F*g is the free energy of the 
transition state (denoted by {) between a and j3 (note 
that this is not the global transition state, but just the 
transition state between structures a and /3). Since the 
eigenvalues and eigenvectors of the k a p matrix define the 
relevant timescales and dynamics, respectively, the next 
step is to flesh out this matrix in more detail. 

There have been previous approximations to model 
k a p, notably setting k a p = kexp(—Ep/T) [10( or k a p — 
kexp(E a /T) |ll| . which yielding solvable models within 
the Random Energy Model (REM) leading to power- 
law [10( and stretched exponential [11| relaxation, re- 
spectively. Also of note is an extension to GREM [8]. 
However, there are two key limitations to these methods. 
First, they only directly apply to a theory for kinetics of 
random sequences. Second, as we argue below, by consid- 
ering the structure of the transition state for transitions 
explictly, we can improve upon the previous models. 

We propose that the transition state between struc- 
tures a and /3 can be approximated in terms of the con- 
tacts in common between these structures: 
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(here, we take advantage of the fact that contacts are 
either valued at or 1, so multiplication works like a 
binary AND operator). Physically, this models the tran- 
sition between a and f3 as breaking the contacts in a not 
present in ft and then forming the contacts present in j3 
that were not originally present in a. We note in pass- 
ing that this approach is potentially broadly applicable 
to a range of problems, whose state information can be 
encapsulated into a binary vector analogous to C"- . 

Eq ([S]) is an advance over previous work in two ways. 
First, we consider directly the microscopic transitions be- 
tween states, i.e. not considering these transitions in 
terms of all going through a single barrier, but many 
different pair-wise barriers. Second, we look directly to 
structural properties of the state to determine the na- 
ture of the transition state structure. However, we stress 
that eq ([6]) is most appropriate for transitions between 
collapsed (or mostly collapsed) states. 

To calculate the free energy as a function of a given 
state, we must include the energy (from the Hamiltonian 
above) as well as a model for the polymeric entropy. We 
follow the model described in [l2| and say that for a chain 
of N persistence lengths, the number of contacts present 
(q a ' a ) in the structure a lead to q aa loops, each of length 
£ ~ N/q a ' a ; these loops each contribute an entropy of 
AS l ° op /k B = -o-+(3/2)ln 9 Q < Q ,wherecr = s-(3/2)ln7V 
and s is a positive quantity related to the flexibility of 
the chain and ks is Boltzmann's constant; note that the 
value of AS 1 ^ 0015 (eg as seen in lattice model calculations 
[2j) is dominated by the a term. 



This leads to the transition state energy 

E afi = e N 2_, C ij C ij C ij + €N N z2 C iJ C V ^ ~ G ij) 
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where q a >P> N = V C%Cf.CE is the three-conformation 
overlap between a, /3, and the native state. Combining 
terms above (and including the entropy), we get 
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where we have defined the effective free energy of contact 
formation Jnn = ^wiv + k B Ta. With this barrier height 
now directly connected to properties of the Hamiltonian, 
we are now ready to examine specific models for folding. 

Exploring the model. — To examine the model, we con- 
sider some limiting cases below. First, we wish to con- 
sider a model for protein-like sequences, i.e. those which 
resulted from evolution (or alternatively protein sequence 
design). In order to obtain reasonable estimates for 
the key transition rates in our model, i.e. fco = k a p 
vs koN = k a N (where a and /3 are representative non- 
folded structures and N is the native state), we look 
to equations (J3|), ©, and flSJ). First, we consider the 
case cn < cmn < 0, i.e. both native and non-native 
contacts are energetically preferred, but native more so. 
The intrastate conversion rate will be the fastest rate, 
since the states are very similar (i.e. q a ^ » q a ' a and 



q a,N ), which leads to a very low barrier height 



•I 

from eq ©. In order to compare fcojv to /coj consider 
that both £jv and ejvjv are negative, but the drive to 
form native contacts is stronger; thus, the barrier height 
determined in eq (|HJ) will be lower for transitions to N. 

More specifically, it is instructive to compare the bar- 
rier transitions from a to some other non-folded struc- 
ture j3 vs folding from a to iV. The difference in bar- 
rier heights AAF* = {Fl - F a ) - (F* N - F a ) = 
AA£t - TAAS* = -k B Tln(k a p/k aN ) is given by the 
combination of an energetic 

AAE t = _ ex ( g «^ _ qa ,0, N) _ eNN ( qa ,N _ q a,/3 } (Q) 

and entropic AAS* = (S^ - S a ) - (Sl N - S a ) 



TAAS t = -k B Ta(q a ' N - q a ^) 
{3/2)k B T(q a > N \nq' 
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contributions. 

Consider the regime where both e x and Jnn are neg- 
ative quantities. Note that Jnn represents the effective 
free energetic drive to form contacts in general and is 
negative when there is a sufficient general attraction that 
beats out the loss of entropy of forming contacts. Since 



three-body overlaps are much more rare than two-body 
overlaps in the unfolded state, then q a ' N > q a >P' N , Fi- 
nally, due to the nature of our Hamiltonian, it would 
be rare for two structures at random to have more con- 
tacts in common with each other than with the native 
state; thus, we can make the approximation that at least 
q a ' N w q a '$ , or more likely q a ' N > q a 'P . Thus, putting 
this all together, we find that, for this regime, AAF* 
is positive, which yields fcojv > &o- As the strength of 
non-native interactions gets more attractive (i.e. cnn 
more negative) and the difference between native and 
non-native strength grows (i.e. €n and cnn negative and 
e x < 0), the more that fcow is greater than /c > emphasiz- 
ing this effect. 

Moreover, since the different collapsed globules have 
roughly the same number of total contacts, the rates 
for unfolded state globule folding to the native state are 
largely uniform (justifying our model as constructed in 
Eq ([l])), yielding single exponential kinetics even though 
there are many parallel pathways. 

Due to the exponential nature of the relationship be- 
tween rates and free energies, this leads to the relation- 
ship fc 7v > k . What are the implications of this? In 
this limit, we would find that folding to the native state 
is fast, compared to dynamics from one non- folded state 
to another. This makes the folded state a kinetic hub, i.e. 
transitions between states are typically mediated through 
the native state. Moreover, generalizations of this model 
with a higher level hierarchy show (as could be seen from 
numerical solutions of this model) that the native state is 
a kinetic hub, i.e. transitions between non-native states 
usually go through the native state. 

Finally, we use the previous model to examine another 
regime, where native contacts dominate, i.e. e^v is neg- 
ative but /nn is positive. In this case, native contacts 
are strongly preferred and non-native contacts are dis- 
couraged (eg with excluded volume repulsion and no at- 
traction due to interactions, as is common in computer 
simulations of Go models [3] ) . Specifically, as we see di- 
rectly from eq ([9]), when Jnn is sufficiently positive, we 
obtain a negative value for AAF$. Thus, this would lead 
to the model parameters of the form fco 3> &oat. 

Physically, interconversion between non-native states 
is fast, since they are not separated by barriers (their 
transition state energies have no contributions from the 
free energy of breaking non- native contacts). This regime 
leads to a very different picture (akin to the "smooth 
energy landscape" picture previously described [3j), in 
which the unfolded state interconverts quickly in general, 
waiting for the rare chance to fold. 

Discussion and Conclusions. — While previous theoret- 
ical approaches [a, |9|] have made seminal contributions 
to the theoretical framework for understanding folding, 
these models did not model the transition state struc- 
turally, which has particularly important implications for 




FIG. 1: Two different kinetic regimes result from our theory, 
as demonstrated in a simple numerical example with n = 3 
blocks and m = 2 states per block, plus the native state (7 
states total); see eq. @ for details of the rate matrix struc- 
ture. The theory presented here is used to calculate the mean 
first passage time (MFPT) between states; edges are shown 
as solid lines if the MFPT is fast (< 30/fc), with the MFPT 
of an edge listed to one signficant digit. In both examples, 
we set fcjvo = &oiv exp(— 8/0.6) and fci = Ik. a) For estimates 
based on the MJ matrix regime (feojv — 0.05&, ko = O.OOlfc), 
we see that the native state (N) is a kinetic hub. b) For a Go 
model regime (fcojv = 0.005A;, ko = 0.5k) we see that there is 
fast interconversion between unfolded states (1-6), with slow 
interconversion to the native state (shown by dotted lines). 



the impact of non-native interactions. As we have seen 
above, the inclusion of non-native interactions critically 
changes the qualitative behavior of the model; indeed, 
this regime has been shown to be particularly relevant in 
recent all-atom protein folding simulation. 

Moreover, we can derive estimates for our parame- 
ters from previous studies of proteins. For example, 
the Miyazawa and Jernigan 13[ matrix's mean (B ps 
-3.2k B T) and a standard deviation ((Bf) 1/2 « 1.5k B T), 
respectively. These results, consistent with other such 
estimates (such as amino acid solvation free energies 
13j). combined with estimates that kTd ps O.&ksT and 
k B Ts ps 1.5ksT [2| indicate that /nn < 0; our the- 
ory therefore predicts that proteins would fold with the 
native state as a kinetic hub (i.e. fast folding to the na- 
tive state, compared to equilibration between unfolded 
states) [7|, depicted in Fig. 1. These averages are formally 
weighted by the amino acid composition |2j of the protein 
sequence and a uniform composition is used above. 

We also note in passing that while an overall collapsed 
model is handled by our theory, the limiting case of min- 
imal (or repulsive) non- native interactions is not, since 
that regime would not lead to collapsed configurations 
and thus eq ((5]) would not be valid; however, this regime 
is already well understood: the preponderance of con- 
tacts are native in this case, and thus folding proceeds 
by the formation of these contacts, as seen previously |2j. 

With this new formalism, we are able to recover and 
potentially explain the behavior seen in all-atom simu- 
lations [g, lZ(. Specifically, we get the primary result 



that the dynamics of interconversion from one non- folded 
state to another can be very slow. This also leads to a 
secondary result that native state is a kinetic hub when 
there is some non-native attraction. This suggests that 
simple, previous choices for a single dimensional reaction 
coordinate (such as using the number of native contacts) 
can lead to a misconception in terms of the fundamen- 
tal dynamics of proteins, since these approaches assume 
that the unfolded state is rapidly interconverting. This 
is correct for some Go models of protein folding, but not 
for models which include non-native attraction. 

Finally, we stress that the property of the unfolded 
state predicted from this theory does not apply to the 
chemically denatured state, in which most experiments 
probing the "unfolded state" of proteins have been per- 
formed; our theory predicts that experiments directly ex- 
amining the true unfolded state will see a much slower 
relaxation time compared to the denatured state. 

To conclude, one of the motivations of this work was to 
develop a model which was simple enough that it could 
be solved analytically, but with the key essence of pro- 
tein folding seen in detailed simulations. The qualitative 
change which derives from the simple addition of the role 
of non-native contacts shows how this model can easily 
be used to probe folding dynamics. By combining de- 
tailed simulations with analytic approaches, insight in a 
single system studied by simulation could be extended to 
a broad range of proteins and protein folding phenomena. 
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